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Design  and  Implementation  of  the  Nested  Relational  Data  Modi  1 
Under  The  Exodus  Extensible  Database  Svstem 


/.  Introduction 


1. 1  Background 

LI.  I  Historical  Perspective.  A  large  segment  of  the  population  is  Inuoining  in(  i<'a>- 
inglv  aware  of  just  how  mudi  data  and  information  processing  continues  to  revolutionize 
everyday  living.  The  complex  technologies  of  today  and  tomorrow  continue  to  advance  the 
means  of  manipulating  electonic  data  in  its  many  forms  to  achieve  some  end,  F<ir  exam¬ 
ple,  an  individual  can  now  shop  for  per.sonal  goods  and  services  with  the  help  of  a  hoitie 
computer,  while  businesses  can  ise  past  ales  data  to  develop  future  strategies.  In  either 
case,  large  amounts  of  data  are  involved  and  require  effective  and  efficient  proci'ssing  with 
the  efforts  of  some  type  of  dataltase  system. 

.\  majority  of  exi.sting  dataha.se  systems  provide  the  features  to  retrieve,  modily. 
and  manipulate  data  within  th<>  well-defined  rules  of  the  relational  dat;i  modtd  (s).  Tlmv 
revolve  around  the  concept  of  files  and  records,  vital  entities  in  today's  hist-paced  business 
transaction  environment.  Because  files  and  records  map  extretneiy  well  into  the  lalml.ir 
format  of  the  relational  model,  software  developers  have  aggressively  itursued  ilie  der'inn 
of  complex,  integrated  databa.se  systems  to  accomodate  this  large,  expanding  market. 

The  relation  is  regarded  as  a  table  of  records  with  cotnmon  [iroperties  or  at  t  rilnii e-. 
These  common  attributes  are  the  fields  defined  on  a  record  within  t  he  relat  ion.  Sevm  al  op¬ 
erations  may  be  performed  on  relations  to  extract  or  modify  the  data  within  the  dat.iba^e. 
The  operations  are  firmly  ba.sed  in  the  relational  algebra  and  calculus,  and  tlie<e  mat  he 
tnatical  concepts  continue  to  provide  the  foundation  for  further  work  in  relationai  ilatahase 
models. 

The  wide  acceptance  of  the  relational  motiel  provides  tlu'  framework  lor  mo>i  mh 
cessful  database  systems  and  an  enterprise  designing  a  database  for  its  particuhtr  mnnis 
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must  adhere  to  the  model  and  its  prescribed  operations.  .\lthou)z,h  not  a  lieon,  m  i, ,i 
enterprises  that  automate  record-type  accounting  proce<lures,  the  model  m,!  mi'et iim  ■  c, 
needs  of  potenti  .  users  seeking  non-traditional  database  support. 

1.1.2  .\'ouiraditional  Datalxise  Applications.  A  growing  lU'ed  for  data’,  .ase  ,-,y:>iein> 
is  spreading  from  the  traditional  applications,  as  mentioned  above,  to  what  are  commonly 
known  as  non-traditional  applications.  Many  research  and  engineering  enierpri.->es,  uiih 
their  propensity  to  collect  and  manipulate  data,  arc  recogidzing  the  need  for  daiaba'e 
management  systems.  However,  many  are  forced  to  design  ad  lioc  or  customized  database 
systems  for  their  particular  applications,  and  the  design  normally  rcvolv('s  around  .>ome  Hat 
file  structure  with  limited  operations.  In  essence,  the  relational  model  and  its  operations 
are  not  providing  an  adequate  vehicle  u^on  which  non-traditional  a[)plications  tan  be 
built.  Computer-aided  design/compuler-aided  manufacturing/computor-aich'd  engineering 
( C.\D/C.-\.M/C.\E).  forms  management,  and  imagery/voice  data  are  just  a  few  I'.'vamitles 
of  non-traditional  database  applications. 

1.1.3  Extending  the  Relational  Model.  To  accomodate  this  growing  area  of  databa.-e 
system  design,  attempts  have  been  underway  over  the  past  decade  to  (>.\pand  llu'  c.tpa- 
bilitier  of  the  relational  model.  With  the  growing  popularity  of  object-oriented  [)rogram- 
ming,  several  commer^  ial  products  are  designing  abstract  data  typing  (.M)  !')  facilities 
in  an  attempt  to  design  database  systems  for  these  non-traditional  applications,  liy  al¬ 
lowing  the  formulation  of  ADTs,  database  designers  are  adding  new  op('raii(,ns  for  the-e 
non-traditional  database  concerns,  to  more  readily  model  the  real  world.  In  addition, 
object-oriented  databases  axe  introducing  the  notions  of  inheritance  and  mess.age.])a.sung 
to  enhance  the  object-oriented  environment  (.'I). 

1.1.4  Separating  the  .Architecture  from  the  Data  Model.  This  lia.s  not  gone  unno- 
tiied  by  database  designers,  hhe  advent  of  new  data  models  beyond  tin-  context  ol  the 
rehational  model  has  forced  a  re-evaluation  of  database  design.  Sevf'ral  systimis.  most  of  .01 
ob jer  t-oriented  flavor,  have  departed  from  the  notion  of  one  model  and  one  ;ircliite<  t  ure. 

I  hese  new  design  packages  take  a  toolkit  approach  toward  the  design  of  databa.se  systems. 
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They  provide  the  basic  necessities  that  a  database  system  requires;  the  .st/>ra»e  .ind 

the  programming  facilities  to  develop  the  system  software.  Several  of  thes(>  curriuit  <  iht'mi 
systems  under  development  include  POSTGRES.  GEMSTONE,  GENESIS,  and  E.XODl'S 
(d). 


1.1.5  The  Sested  Relational  .\fodel.  The  nested  relational  model  is  an  i‘,xten>ion  id 
the  traditional  relational  model.  Besides  retaining  the  traditional  relational  opiTai ion>. 
most  research  efforts  point  toward  the  addition  of  two  more  operators;  nest  and  nnii  '^i. 
With  the  augmentation  of  these  operators,  the  nesting  of  relations  within  relaiiuns  i^ 
t lieoretically  possible.  That  is.  the  attributes  defined  on  some  relation  may  actuall;  In’ 
set-valued  or  relation- valued  domains.  The  nested  relational  data  model  aiipcars  to  he 
a  possible  alternative  to  support  non-lraditional  database  system  applications  and  the 
architectures  mentioned  above  provide  the  basic  means  to  design  and  implemiMit  such  a 
.system. 

1.3  Purpo.'^e  of  Thesis 

This  thesis  will  use  the  FTXODUS  architecture  to  implement  the  lu'sied  rel.iiional 
data  model.  EXODUS,  an  acronym  for  Extensible  Object-oriented  Datahasi'  Sy>i.  iii.  i> 
the  property  of  the  University  of  Wisconsin  and  is  an  ongoing  research  project  wiildn 
tlie  university's  computer  science  department.  The  e.xtensibility  of  E.XODUS  lies  in  ii> 
jirovision  of  the  necessary  tools  and  components  to  develop  application-specific  database 
systems.  The  system  compiler,  u.sed  for  the  compilation  of  system  software,  is  h.aseii  on 
the  E  programming  language,  an  extension  of  the  object-oriented  programming  lamruaiii' 
G-t-  +  . 

The  present  direction  of  database  design  actually  strips  away  tnany  of  t  he  compom’iit  s 
and  facilities  commonly  available  with  today's  systems.  In  this  manner,  standardized  tool.-, 
and  components  may  be  implemented  on  top  of  a  particular  architecture  such  as  EXODUS. 
If  generic  or  all-purpose  components  are  already  available  in  the  system’s  librarif's.  tin' 
database  engineer,  or  DBE,  adds  the  necessary  ones  in  a  modular  fasliiot\  to  ;i\()i<i 
inventing  the  wheel.” 
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1.2.1  The  Problem.  Tho  main  objortiveof  this  thesis  effort  is  to  construct  a  wo,  kin» 
nested  relational  database  management  system  utilizing  the  tools  providc'd  bv  the  Kxudus 
architecture.  The  overall  problem,  then,  is  to  accurately  reflect  the  model's  features  wii  hiu 
the  constraints  imposed  by  Exodus.  A  breakdown  of  the  problem  entails  examining  ea(  h 
one  of  the  system  components  and  tailoring  it  to  the  specifics  of  the  lu'sied  reiatioual 
data  model.  Component  considerations  range  from  choosing  a  suitable  relational  aluebra 
to  designing  and  implementing  operator  methods  for  actual  data  extractiem.  Six  primary 
areas  of  concern  to  achieve  the  thesis  objective  are  discussed  below. 

Relational  algebras  for  the  nested  relational  data  model  are  j)rimarily  (‘xtensiniis  of 
the  traditional  relational  data  model  as  first  presented  in  (6).  In  this  o'search.  two  of  ilu' 
c'Xtended  algebras  under  consideration  include  those  found  in  (7)  and  (1.5).  I'he  Colby 
algebra  was  chosen  because  it  has  an  inherent  and  pleasing  approach  in  it.,  interaction 
with  nested  relations,  and  may  prove  less  difiicult  to  implement  under  an  object-oriented 
development  environment.  The  basic  problem,  then,  involves  accurately  transforming  the 
Colby  relational  algebra  into  the  software  domain  defined  by  Exodus. 

The  catalog  manager  is  an  important,  necessary  component  for  any  ty])e  of  databa.'C 
system.  It  is  here  where  information  is  maintained  for  all  databases  and  their  rel.iticjus 
associated  with  the  system.  The  catalog  manager  permits  various  sy>tem  component  to 
access  the  data  dictionary  and  obtain  information  crucial  to  their  own  operation.  I'he  data 
dictionary  consolidates  this  information  into  data  tables  defining  different  aspects  of  the 
database  system  as  a  whole.  .A  system  using  the  nested  relational  data  model  must  account 
for  the  nested  attributes  and  in  which  relations  they  currently  exist.  The  design  prublems 
with  respect  to  the  catalog  manager  and  data  diciionary  center  on  these  tables  and  i  In- 
loutines  which  access  them.  The  design  must  consider  the  number  of  tables  which  mu^t  b,' 
created  for  providing  minimum  system  functionality  as  well  ais  the  attributes  which  (bdinc 
the  properties  of  the  tables.  The  routines  must  allow  all  components  reipiiring  catalog 
support  uniform  access  into  the  data  dictionary. 

7'he  third  area  of  concern  focuses  on  the  structure  of  query  tree.  The  tri'e  reflects  the 
nature  of  the  system's  relational  algebra,  so  the  tree  structure  design  and  the  relatiomil 
algebra  chosen  coincide  very  closely  with  each  other.  Since  Colliy's  algcdira  is  to  be  im 
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plemented,  the  operator  specifications  must  bo  transformed  into  a  struct  uro  rf'cognizod  by 
the  optimizer.  All  query  data  entered  by  way  of  the  parser  must  be  placed  into  the  s[)n((' 
allocated  by  the  nodes  comprising  the  query  tree,  each  node  representing  an  operat(jr  of 
the  relational  algebra.  Because  of  the  recursive  nature  of  the  algebra,  additional  auxil¬ 
iary  nodes  maintain  the  condition  and  navigation  information  required  for  descending  into 
the  nested  structure  of  relations.  These  auxiliary  nodes  represent  the  operator  lists  for 
each  of  the  respective  relational  operators.  Ihe  problem  highlights  the  need  to  ensurf*  tlie 
relaMonaJ  algebra  and  the  tree’s  data  structure  match  the  intended  nature  of  the  algebra. 

To  enable  the  proper  operation  of  the  catalog  manager  with  respect  to  the  data 
dictionary,  and  to  ensure  that  the  relational  algebra  is  properly  translateil  into  a  ([uery 
tree,  another  component  must  be  designed  to  tie  these  entities  together  into  the  system. 
The  parser  provides  the  mechanisms  to  allow  this  type  of  system  functionality  to  occur. 
.Since  Exodus  does  not  provide  any  type  of  parser  design  capabilities,  the  I’.NIX  tool  known 
as  Y.\CC  (for  Yet  .‘\nother  Compiler  Compiler)  will  be  used  for  this  purpose.  The  difficulty 
lies  in  establishing  the  proper  grammar  rules  for  system  operation  as  a  whole. 

The  fifth  major  area  of  concern  involves  the  generation  of  the  K  source  code  to 
execute  the  query.  .Xit hough  this  is  situated  after  the  query  optimizer,  the  optimizer  will 
not  be  of  concern  in  this  project.  Because  of  its  expected  complexity,  a  doctoral  student 
is  developing  the  optimizer  component.  Upon  exiting  the  (piery  optimizer,  tlu'  (piery 
tree  has  become  a  plan  tree,  structured  in  such  a  way  to  efficiently  access  the  dat.ihase 
with  respect  to  the  current  query.  Becau.se  the  data  structure  is  unitpie  to  the  Colby 
algebra,  routines  must  be  developed  to  accurately  walk  the  plan  and  its  auxiliary  list  and 
condition  structures,  inspecting  the  information  and  generating  the  recpiired  source  code. 
Once  all  of  the  source  code  is  generated,  the  routines  dynamically  link  its  object  code  to 
the  implemented  operator  methods.  The  problem,  therefore,  is  twofold.  First,  accuratidy 
generate  the  E  code  and,  second,  link  it  with  the  required  operator  metluKls  to  prodinc 
the  e,xecutable  module. 

The  final  problem  focuses  on  the  structure  of  the  relation,  which  must  logically  and 
physically  allow  for  the  nesting  of  relations.  Procedural  programming  languages  hav(>  not 
traditionally  permitted  such  structures,  and  DBEs  have  had  to  resort  to  ad  hoc  measun's 


to  achieve  this  end.  However,  since  E  is  a  database  system  lang"age,  mechanisms  added  to 
C+-f  appear  to  bridge  the  gap  between  the  logical  and  physical  nature  of  iie^t(>d  relations. 
This  effort  will  take  advantage  of  these  features  to  design  and  implement  nested  structures. 

1.2.2  Scope  of  the  Thesis.  By  indicating  the  six  problem  areas  of  designiii”;  and 
implementing  the  nested  relational  data  model  within  the  Exodus  framework,  the  .sc(j|)e 
of  this  thesis  effort  was  narrowed  to  achieve  this  objective.  The  data  dictionary  contains 
enough  information  to  allow  proper  operation  of  the  database  .system,  while  the  catalog 
manager  has  the  routines  needed  to  access  the  data  tables.  The  routines  are  able  to  he 
invoked  by  any  other  component  in  the  system.  Project,  select,  and  natural  join  are  the 
operators  implemented  within  the  databa.se  The  parser  is  able  to  parse  the  (pit'ry  into  tlie 
proper  tree  structure  as  well  as  allow  for  the  creation  of  nested  relations  by  the  u.ser.  The 
walking  routines  are  able  to  walk  the  plan  tree  and  generate  the  proper  E  code,  dynamically 
linking  the  query  to  the  operator  methods.  The  scope  outlined  above  provides  a  minimum 
functionality  for  the  nested  relational  database  system. 

1.3  Sequence  of  Presentation 

First,  the  nested  relational  data  model  is  examined  in  Chapter  II.  followed  by  a 
discussion  of  the  Exodus  Extensible  Architecture  in  Chapter  Ill.  .After  presenting  this 
background  information,  the  design  and  implementation  of  the  data  model  under  Exo¬ 
dus  is  demonstrated  in  Chapter  IV.  Finally,  conclusions  are  provided  on  this  effort,  with 
recommendations  for  future  research. 
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11.  The  .\ested  Relational  Data  .Model 


2. 1  Introduction 

The  relational  data  model  is  the  primary  vehicle  for  most  of  today's  sophisticated 
database  management  systems.  The  model  is  defined  within  sound  mathematical  princi¬ 
ples,  and  it  is  this  very  foundation  which  permits  accurate  manipulation  of  relations  and 
their  corresponding  data  members.  By  understanding  and  applying  the  rehitional  func¬ 
tions  to  these  abstract  structures,  an  appreciation  of  the  model's  .strengths  d(>monst rates 
an  increasing  role  for  further  e.xtension.  New  data  models  are  a  restilt  f)f  these  e.vten.sions. 
including  the  nested  relational  data  model. 

Since  the  relational  data  model  lends  cred.-nce  to  the  nested  relational  data  model, 
features  unique  to  this  traditional  model  must  initially  be  examined.  .\  constructive  ap¬ 
proach  of  this  type  attempts  to  eliminate  misconceptions  of  the  nested  relational  data 
model.  Once  an  appropriate  amount  of  background  is  presented  for  the  traditional  rela¬ 
tional  data  model,  the  properties  of  the  extended  version  of  this  model  will  be  discussed. 

2.2  The  Relational  Data  .Model 

The  structure  of  the  relational  data  model,  sometimes  prefixed  by  the  "conventional" 
or  “traditional”  adjectives,  may  be  visualized  in  two  different  forms.  First,  the  logical 
nature  of  the  data  model  is  the  most  recognizeable  structure  in  the  object-oriented  arena, 
indicating  some  particular  entity  with  its  various  properties.  The  abstractitess  of  the 
model  aJlows  the  mapping  of  traits  of  real-w'orld  objects  into  the  software  domain,  where 
the  objects  may  then  be  invoked  to  behave  as  if  in  the  real-world.  Second,  the  |)hysical 
nature  of  the  data  model  attempts  to  separate  these  traits  into  a  straightforward,  tabular 
format.  Instead  of  abstractly  visualizing  an  object  with  specific  properties  unique  to  the 
object,  the  physical  table  or  relation  permits  actual  object  data  to  be  compared  to  other 
objects  with  similar  properties. 

A  table  of  values  is  the  most  likely  representation  of  an  object  modeled  under  the 
relational  data  model.  Rows  and  columns  are  essential  elements  of  a  relation,  where  each 
column  indicates  a  particular  attribute  of  an  object,  while  each  row  represents  an  entire 
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instance  of  an  object  defined  by  the  relation.  Attributes  must  rely  on  already  defined  type-, 
such  as  the  integer,  float,  or  character  types  available  in  most  software  environments.  Of 
course  these  types  may  represent  other  types  or  instances  in  a  higher  abstract  sense.  1  or 
instance,  the  integer  A’  is  commonly  a  stand-in  for  the  Boolean  variable  'TRUE',  while  O' 
reverts  to  the  Boolean  ‘FALSE’.  In  this  case,  integers  are  used  for  the  Boolean  type. 

.\n  example  of  a  relation  should  help  emphasize  the  logical  and  physical  nature  it 
attempts  to  model.  From  a  logical  standpoint,  the  relation  represents  some  object  in 
the  real-world  domain.  An  often  used  object  is  person  since  its  properties  may  easily  lie 
understood  and  subsequently  derived.  To  narrow  down  the  properties  attributable  to  any 
one  person,  the  actual  person  object  may  be  scaled  down  to  a  child  object,  where  the 
child  object  most  likely  has  ties  to  some  other  object  such  as  an  employee  object.  That 
is,  the  relation  defines  the  attributes  of  children  belonging  to  employees  of  some  particular 
corporation. 

To  relate  each  of  the  child  object  instances  to  employee  objects  presumably  residing 
elsewhere  in  the  system,  one  child  attribute  must  be  able  to  directly  link  its  object  instances 
to  those  employee  instances  having  children.  For  this  purpose,  the  emp.ssn  attribute, 
representing  an  employee’s  social  security  number,  becomes  one  of  the  key  attributes  which 
provides  uniqueness  to  the  child  objects.  Other  attributes  of  a  child  include  name,  aye, 
and  sex.  From  the  countless  possibilities  which  may  define  a  child,  the  database  requires 
only  a  few  attributes  central  to  the  application  for  which  the  database  was  developed. 

From  a  physical  standpoint  the  relation  consolidates  the  attributes  of  the  object  into 
a  coherent  structure.  Columns  of  the  table  or  relation,  shown  in  Figure  2.1,  represents  each 
of  the  child  attributes,  while  each  of  the  rows  or  tuples  are  the  specific  children  instances. 
In  this  example,  there  are  ten  instances  of  the  child  object.  The  attributes,  as  previously 
discussed,  must  be  one  of  the  basic  types.  The  emp_ssn  attribute  may  bo  an  array  of 
characters  or  strings,  or  it  can  be  regarded  as  an  integer  with  the  hyphens  inserted  at  tin' 
time  of  extraction.  In  the  same  way,  name  is  a  string,  age  is  an  integer,  and  sex  is  a  singl<> 
character,  ‘M’  or  ‘F’. 

The  operators  of  the  relational  algebra  do  not  specifically  model  the  behavior  of  the 
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234-61-9825 

.\nele 

1 

F 

Figure  2.1.  The  child  relation. 


object  itself,  but  rather  they  permit  data  to  be  extracted  from  the  object  or  modified 
within  it.  Although  this  is  a  shortcoming  for  an  object-oriented  development  environment 
where  real-world  object  properties  and  behaviors  are  mapped  into  the  software  domain, 
the  relational  operators  are  well-suited  for  particular  applications.  Most  of  these  are  ac¬ 
counting  and  transaction-oriented  applications  which  have  made  the  relational  data  model 
an  indispensible  tool  in  the  work  place. 


2.2.1  The  Relational  Operators.  A  number  of  operations  in  the  relational  algebra 
are  possible  on  relations,  including: 


•  Project 

•  Select 

•  Natural  Join 

•  Cartesian  Product 

•  Set  Operations 


The  first  three  operators  are  the  primary  elements  necessary  for  extracting  data  from 
relations.  The  cartesian  product  operator  is  a  generalization  of  the  natural  join,  while  the 
.set  operations  include  union,  intersection,  difference,  and  divide,  .\lthough  the  cartfsinn 
product  operator  and  the  set  operations  provide  significant  functionality  to  user  queries. 
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the  first  three  permit  the  necessary  extraction  of  data  from  relations.  Codd  ((>)  and  Kortli 
( 11)  provide  a  good  explanation  of  each  of  these  operators. 

A  database  populated  by  relations  normally  representing  objects  integral  to  one  an¬ 
other  in  some  fashion,  may  temporarily  force  these  objects  together  using  the  join  o[)erator 
and  pulling  out  instances  according  to  some  criteria  or  obtaining  a  subset  of  the  properties 
of  the  objects.  In  other  words,  the  three  operators  are  used  in  various  combinations  to 
extract  certain  data  elements  from  all  the  existing  relations.  The  conglomeration  of  ob¬ 
jects  in  the  database  defines  some  environment  or  entity  such  as  a  corporation  which  may 
include  items  such  as  a  product  relation  and  a  resources  relation  in  addition  to  the  already 
defined  child  and  employee  relations. 

Because  many  relations  within  a  particular  environment  are  interrelated,  a  b<'tter 
approach  is  to  combine  some  of  the  relations  into  a  single  one.  I'his  would  alleviate 
using  the  join  operator  for  two  or  more  relations  which  are  joined  for  a  majority  of  queries 
involving  them.  However,  with  the  voluminous  amounts  of  data  that  will  riiost  likely  result 
from  such  an  endeavor,  it  is  better  to  keep  the  relations  as  separate  entities  to  facilitate 
more  logical  query  formulation.  However,  these  inter-relationships  cannot  necessarily  be 
ignored  either.  Some  method  or  approach  must  permit  a  more  logical  structure  to  account 
for  these  dependencies  without  altering  the  basic  functionality  of  the  three  operators. 

2.3  The  Nested  Relational  Data  Model 

Much  of  the  overall  structure  and  behavior  of  the  relational  data  model  is  essential 
to  the  underlying  framework  of  the  nested  relational  data  model.  The  relational  data 
model  considers  its  objects  as  separate  entities  organized  in  a  horizontal  fashion.  The  join 
operator  links  the  relations  whenever  query  processing  requires  a  combination  of  the  data. 
However,  a  vertical  orientation  is  more  appropriate  in  many  of  these  cases.  By  permitting 
only  one  level  of  abstraction  within  a  database  environment,  the  real-world  environment 
which  a  relational  database  represents  is  somewhat  skewed.  \  good  possibility  exists  that 
some  of  the  relations  should  actually  be  subordinate  to  other  relations  to  properly  nuxb'l 
the  real-world  objects.  This  is  also  known  as  aggregation.  Tables  situated  horizontally  are 
considered  flat  relations  and  are  in  first  normal  form  (INF).  Their  data  is  in  a  nondecom 


posable  or  atomic  state.  On  the  other  hand,  if  the  relations  were  to  model  the  real-world, 
the  database  would  have  to  become  more  vertical,  placing  relations  within  relations. 

Traditional  relations  place  their  emphasis  on  Hat  tables  and  the  relational  operators, 
while  OOD  considers  objects  and  the  functions  that  may  operate  on  the  objects.  The  at¬ 
tempt  is  to  model  the  behavior  within  the  database  system.  Nested  relations,  as  composite 
objects  (10),  require  certain  data  members  and  functions.  ,\n  environment  mii.st  ixMinit 
the  data  members  within  the  object  to  become  the  atomic  and  relation-valued  attributes, 
while  the  functions  modeling  an  object’s  behavior  must  be  found  in  its  member  fnnction.s. 
The  relational  operators  can  then  also  be  considered  objects,  requiring  data  members  to 
be  in  the  form  of  nested  relations  and  member  functions,  when  used  in  unison,  to  perform 
the  operation.  This  is  the  concept  behind  OOD  within  the  database  system. 

By  inserting  relations  within  relations,  this  vertical  orientation  suggests  the  nesting 
of  relations.  .According  to  this  scheme,  then,  a  top-level  relation  represents  some  particular 
real-world  object,  where  its  composition  is  based  on  subobjects.  These  subobjects  may 
either  be  basic  data  types  that  cannot  be  broken  down  any  further,  or  they  may  be  relations 
which  are  further  comprised  of  their  own  particular  attributes.  In  th.is  iiested  structure', 
attributes  are  either  nondecomposable  or  atomic  attributes  or  they  are  actually  relations, 
commonly  referred  to  as  relation-valued  attributes.  This  nested  relation  struct\ire.  as 
opposed  to  INF  relations,  is  known  as  non-first  normal  form  (-ilNF). 

2.3.1  The  Nested  Relation.  Since  nested  relations  are  comprised  of  relations  and 
atomic  attributes,  the  basic  relational  structure  is  still  the  primary  building  block.  How¬ 
ever,  the  nested  relational  data  model,  to  form  its  very  structure,  must  provide  a  means 
to  insert  relations  in  addition  to  the  basic  relational  operators.  Before  discussing  the 
mechanics  of  these  operations,  an  example  is  in  order. 

The  nested  structure  may  be  illustrated  using  the  two  flat  relations  found  in  Fig¬ 
ure  2.1  and  Figure  2.2.  To  eliminate  the  joining  of  the  child  and  employee  relations  along 
their  common  attribute,  emp_ssn,  before  applying  the  other  two  operators,  the  child  rela¬ 
tion  can  be  inserted  as  a  relation-valued  attribute  of  the  employee  relation.  In  Figure  2.d, 
the  employee  relation  inserts  the  child  relation.  Now,  all  other  attributes  within  the  <  ni- 
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Washington.  A.B. 
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45 
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Engineering 

Carter,  G.H. 

38 
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Figure  2.2.  The  employee  relation. 
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Figure  2.3.  The  employee  nested  relation. 


ployee  relation  are  atomic,  while  all  attributes  within  the  children  subrelation  are  also 
atomic-valued.  In  this  case,  one  level  of  nesting  exists  within  the  employee  relation. 

In  Figure  2.3,  the  emp_ssn  disappears  within  the  children  attribute  since  this  infor¬ 
mation  exists  within  the  employee  relation.  What  once  required  a  join  operation  and  ten 
tuples,  is  now  condensed  into  one  relation  of  four  tuples.  These  four  tujiles  are  actually 
supertuples,  since  the  children  attribute  is  a  set  ot  tuples  or  subtuplcs  for  every  tuple  of 
the  employee  relation. 

2.3.2  The  Relational  .Algebra.  Because  the  relational  structure  plays  an  iittegral 
part  in  the  nested  relational  data  model,  the  relational  operations  also  function  on  the 
nested  structure,  although  in  an  extended  form.  .Although  there  are  several  versions  of 
the  extended  relational  algebra  to  choose  from,  the  Colby  relational  algebra  is  used  in  this 
thesis.  Its  apparent  simplicity  permits  easier  design  and  implementation  into  the  query 
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tree  structure.  The  three  major  relational  operators  -  project,  select,  and  natural  join 
are  supported,  as  well  as  the  set  operations.  Because  the  nested  relational  data  model 
requires  a  method  of  nesting  or  unnesting  a  relation  at  any  level  of  a  nested  relation's 
hierarchy,  the  nest  and  unnest  operators  are  added  to  the  algebra. 

2.3.2. 1  The  \est  Operator,  .■\lthough  nested  relations  can  be  initially  struc¬ 
tured  at  the  time  of  their  creation,  the  ability  to  nest  attributes  within  an  existing  relation 
should  be  permitted.  This  entails  grouping  attributes  situated  at  the  same  level  of  nesting 
and  invoking  the  operator  to  nest  this  group  of  attributes  one  level  deeper. 

.\ccording  to  Colby,  the  nest  operator  requires  three  elements.  First,  an  attribute 
list  indicates  the  group  of  attributes  to  be  nested  to  some  deeper  level.  The  relation  main¬ 
taining  the  said  attributes  in  the  list  is  the  second  element  required  for  the  operator.  The 
third  element  represents  the  new  relation-valued  attribute  for  which  the  list  of  attributes 
will  be  nested,  ,4s  an  example,  to  nest  the  children  attributes  shown  in  Figure  2.1  into  the 
nested  vei  jion  in  Figure  2.3,  the  mathematical  formulation  is  characterized  in  the  following 
manner: 


u  {  child.name,  child. age,  sex  }  (  employee  )  <  children  >. 

2. 3. 2. 2  The  Unnest  Operator.  This  operator  is  the  dual  to  the  nest  n])erator, 
although  performing  the  reverse  operation.  It  only  requires  specifying  the  relation- vabied 
attribute  that  is  to  be  unnested.  For  example,  to  undo  the  nesting  of  Figure  2.3  and  obtain 
the  relation  in  Figure  2.4  once  more,  the  unnest  operator,  p,  may  be  used  in  the  following 
manner; 


p  (  employee  )  <  children  >. 

2. 3. 2. 3  The  Project  Operator.  The  project  operator  requires  two  elements  to 
project  out  the  requested  attributes  of  a  nested  relation.  A  project  list  permits  the  project 
operator  to  reach  any  attribute,  atomic  or  relation-valued,  and  project  out  the  contents 
of  a  single  column  or  an  entire  subrelation.  The  second  element  is  the  source  relation 


2-7 


dept 

emp_name 

etnp_(ige 

tiiip^s.sn 

rhild_  name 

f'llihi_a(ji  ;  '(  / 

Accounting 

Washington,  A.B. 

33 

192-8.3-746.5 

Bob 

M  1 

Accounting 

Washington,  A.B. 

33 

192-83-7465 

( 'arol 

1  '  F  : 

Engineering 

Carter,  G.H. 

38 

.32.5-96-0127 

Kyle 

10  '  .M  ; 

Engineering 

Carter,  G.H. 

38 

.325-96-0127 

John 

12  1  \\  I 

Engineering 

Carter,  G.H. 

38 

.32.5-96-0127 

Lynne 

6  ^  F  ; 

Development 

Kennedy,  E.F. 

If) 

.5 19- 7.3. 3  790 

•Mike  i  7  .\I  , 

Research 

Lincoln,  C.D. 

44 

234-61-9X2.5 

Loin  j  '<  .M 

Research 

Lincoln,  C.D. 

44 

2.34-61-9825 

Jori'iny  1  M 

Research 

Lincoln.  C.D. 

44 

234-61-9825 

Liffany  [  7  !•' 

Research 

Lincoln,  C.D. 

44  i  234-61-9x25 

Anele  *  1  L 

Figure  2.4.  The  utinested  ver.sion  of  the  t  tuplnyif  ri'lation. 
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or  a  relation  resulting  from  another  query.  From  the  nested  relation  in  Figure  2.4.  thi' 
project  operator  may  be  used  to  project  out  the  emp_name  and  child^uainf  attribute^  m 
the  following  manner: 

tr  ■(  emp_naTne,  children  {  child^name  }  }  (  employee  ). 

The  children  attribute  must  be  specified  in  the  list  to  allow  the  opt-rator  to  des(  I'nd 
to  the  next  level  of  nesting  to  obtain  the  data  values  under  the  c/uld_riamf  attribute,  i  lu' 
resulting  relation  is  shown  in  F'igure  2. .5. 

2.3.2. ^  The  Select  Operator.  The  select  operator  reipiires  elements  similar 
to  the  project  operator  plus  an  additional  element.  Like  the  project  list,  the  si-lect  list 
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also  permits  the  operator  to  descend  to  any  depth  of  the  relation.  F.itlu'r  a  relation  or  .a 
relation  resulting  from  another  query  provide  the  source  tuples  for  the  operator.  Finally 
conditions  or  predicates  must  be  able  to  be  specified  on  the  top-level  atomic  attribute's  or 
on  lower-level  relations. 

For  instance,  to  select  those  tuples  from  the  employee  relation  in  which  an  t'liiploye.’e'.s 
age  is  greater  than  forty  and  the  child’s  age  is  greater  than  fotir  will  reepiire'  the  following 
formulation; 


a  (employee)  [emp.eige  >  -10]  {  children  [rhild.age  >  ■}]  }. 

riie  top-level  condition  must  immediately  follow  the  relation  spi’clfu'd  within  the 
par'.'tit hesis,  while  lower-level  conditions  are  found  in  the  select  list.  The  resulting  rolaiion 
may  be  found  in  Figure  2.6. 

2. 3. 2.0  The  .Wituml  Join.  A  major  difference  between  th<'  tr.ulitional  lel.i. 

tional  algebra  natural  join  and  the  Colby  natural  join  is  the  requirement  of  specifying 
j<un  path.  This  path  indicates  the  attributes  on  which  two  relations  or  relations  resulting 
from  queries  may  be  joined.  The  path  is  basically  the  navigation  tool  tf)  reach  the  nesteil 
attributes  for  the  join. 

.\n  example  of  the  natural  join  involves  the  employee  relation  in  f  igure  2.:i  and  the 
.•^rh(M)l  relation  in  Figure  2.7.  The  .school  relation  maintains  the  student  data  for  all  those 
attending  the  local  school  district.  The  formulation  of  the  join  is  as  follows: 

tx  (employee)  {  children  }•  (school). 

Figure  2.S  is  the  relaticm  resulting  from  the  join. 
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The  the  nested  relational  data  model  obtains  most  of  its  striiciure  and  fiiin  i  Kniality  from  i  In- 
traditional  relational  model.  The  operators  in  the  nested  model  re<)iiire  extensions  to  allow  lie  n: 
to  descend  to  nested  relations.  In  addition  to  extending  the  relational  operators,  ihe  nfsiecl  .oo.l.-l 
al.so  uses  a  nest  and  unnest  operator  to  allow  some  restructuring  of  existing  rel.'itions 
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III.  The  Eiodus  Extensible  Database  .Architect art 


3. 1  Introduction 

As  an  extensible  database  system,  the  Exodus  architecture  is  not  a  <latal)aM'  in 

and  of  itself,  but  rather  an  integrated  package  of  software  tools  required  for  database  de-:i;ii 
and  implementation.  The  tools  encompass  the  development  of  the  primary  compoiieiii 
necessary  to  implement  a  database  system  for  a  specific  application  or  a  particular  data 
model.  For  a  database  system  under  consideration,  a  database  engineer  (  DHK )  must  de-ieii 
and  implement  a  host  of  system  components  as  well  as  a  number  of  auxiliary  functions  to 
satisfy  data  model  or  user  requirements.  In  order  to  understand  the  functionality  of  all 
required  system  components,  the  DBE  must  be  aware  of  how  each  of  the  Kx(k1us  software 
tools  will  assist  in  creating  or  generating  the  system  components  for  the  task  at  hand. 

.An  analysis  of  a  database  system  incorporating  the  nested  relational  data  model 
exhibits  a  close  resemblance  to  the  structure  of  traditional  relational  database  systems. 
To  design  and  implement  such  a  system,  a  number  of  the  same  components  are  reriuirml 
with  similar  characteristics.  However,  an  assessment  of  each  of  the  comitoiumts  reveals  an 
extension  of  conventional  design  features  to  account  for  the  nesting  of  relations.  Under  the 
Exodus  environment,  the  DBE  is  concerned  with  the  following  system  components: 

•  Parser 

•  Catalog  Manager 

•  Data  Dictionary  or  Schema 

•  Query  Optimizer 

•  Query  Compiler 

•  E  Compiler 

•  Operator  Methods 

•  Access  Methods 

•?  1 


J 


Storage  Manager 


Query 


Figure  3.1.  A  typical  Exodus  system  design. 

•  Database 

.A  typical  system  design  may  be  found  in  Figure  3.1. 

An  advantage  of  Exodus  focuses  on  component  renseability,  a  feature  of  oxten.sible 
architectures  permitting  rapid  development  of  other  systems  with  similar  characteristics. 
The  amount  of  implementation  ranges  from  components  produced  via  input-driven  gener¬ 
ators  to  components  entirely  designed  and  implemented  by  the  DUE.  Properly  analyzing 
system  requirements  should  lend  itself  to  indicating  possible  areas  of  component  reuse', 
thereby  eliminating  duplication  of  past  effort. 

The  standardization  of  components  for  database  systems  therefore  provides  some 
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credence  to  software  engineering  principles  during  system  requirements  analysis,  riic  K.xo- 
dus  tools  provide  a  frame  of  reference  from  which  to  compose  and  decompose  the  intended 
database  system  while  ascertaining  its  requirements.  The  building  blocks  may  be  te.sted 
according  to  various  test  ca^es  to  ensure  the  viability  of  a  particular  component  under  some 
requirement  essential  toward  the  make  up  of  the  data  model  or  application.  With  this  in 
mind,  the  nested  relational  data  model  hcis  unique  requirements  which  must  be  taken  into 
account  at  the  component  level  before  assembling  the  entire  system.  A  breakout  of  each 
of  the  Exodus  components  with  respect  to  the  requirements  of  the  nested  relational  data 
model  provides  an  outline  of  the  DBE’s  preliminary  design  tasks  before  advancing  into 
detailed  design  and  subsequent  implementation. 

3.2  The  Parser 

The  main  purpose  of  the  parser  is  to  link  the  user  with  the  database  system.  Specif¬ 
ically,  the  user,  by  way  of  a  data  manipulation  language  and  a  query  language,  may  access 
the  catalog  manager  and  the  query  optimizer.  These  two  components  are  important  in 
their  own  respect  and  will  be  examined  in  upcoming  sections.  The  parser,  by  providing 
the  user-interface,  translates  human-understandable  language  to  one  more  conducive  to 
database  system  management. 

The  user  communicates  through  a  standard  set  of  commands,  issuing  directives  which 
engage  the  mechanisms  of  the  database.  These  commands  are  normally  encapsulated 
into  an  interface  referred  to  as  a  query  language.  A  number  of  query  languages  have 
been  developed  and  implemented  in  traditional  relational  systems,  so  it  is  no  wonder  that 
extensions  of  such  languages  have  been  proposed  for  the  nested  relational  data  model.  Of 
the  relational  query  languages,  SQL  is  the  most  popular  to  date  and  extended  languages 
such  as  SQL/NF  for  nested  relations  provide  a  strong  backdrop  for  implementing  the  nested 
model.  The  DBE  must  choose  an  appropriate  query  language  or  develop  one  reflecting  the 
nature  of  nested  relations.  Once  it  is  chosen,  a  high-level  parser  may  be  designed  and 
implemented  to  incorporate  the  query  language. 

Exodus  does  not  provide  a  parser  or  scanner  as  a  development  tool,  so  an  alterna¬ 
tive  must  be  found.  For  this  purpose,  the  popular  UNIX  tools  of  YACC'  (Vet  .Another 
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Compiler  Compiler)  and  LEX  (Lexical  Analyzer)  are  the  most  commonly  used  tools.  LEX 
supports  the  capability  to  scan  the  input  for  specific  commands  necessary  to  indicate  to 
the  system  what  is  requested.  For  instance,  if  a  command  to  create  a  new  relation  wa.s  war¬ 
ranted,  the  words  create  relation  will  possibly  have  the  associated  tokens  of  T.CRE.A  I  E 
T.RELATION,  signifying  that  a  create  and  relation  token  are  to  be  sent  to  the  parser. 
Since  numeric  and  character  data  are  essential  for  database  operation,  generic  scanner 
formulas  for  floats,  integers,  and  strings  must  be  developed.  However,  since  scanners  with 
this  type  of  data  have  been  developed  before,  their  formulation  will  closely  match  earlier 
representations  found  in  other  scanners.  Other  types  of  input  required  for  scanning  in¬ 
clude  the  equality  and  inequality  operators,  the  set  operators,  data  dictionary  commands, 
and  query  commands.  Tokens  will  have  to  be  returned  for  each  one  of  these  commands 
or  inputs.  Regarded  as  a  single  component,  the  scanner  is  the  first  subcomponent  of  the 
interface  with  the  database  user,  and  tokens  from  user  commands  are  sent  to  the  parser 
subcomponent  for  assembling  the  user’s  wishes  into  database  commands.  In  other  words, 
the  scanner  and  parser  subcomponents  comprise  the  “parser”  component,  since  the  parser 
subcomponent  cannot  operate  without  a  scanning  operation. 

Once  a  token  is  sent  from  the  lexical  analyzer,  the  parser  attempts  to  match  one 
of  the  grammar  rules  containing  the  tokens  within  the  parser.  YACC  uses  a  left  to  right 
resolution  method  to  determine  what  the  user’s  command  entails.  Within  each  of  the 
grammar  rules,  actions  may  be  taken  to  direct  the  database  system  to  carry  out  a  specific 
operation. 

To  illustrate  this  in  more  detail,  an  examination  from  the  earlier  create  relation 
example  is  in  order.  Once  the  T.CREATE  and  T.RELATION  tokens  have  been  sent  to 
the  parser,  the  parser  will  most  likely  have  some  action  to  perform.  In  this  case,  the  action.s 
invoke  procedures  to  allocate  and  input  a  relation.  Once  the  operations  are  completed,  the 
control  returns  to  the  user  by  way  of  the  parser. 

3.3  The  Catalog  Manager 

The  catalog  ma  -.ager  is  the  mechanism  responsible  for  ensuring  proper  organization 
of  relations  within  the  database.  Data  tables  which  identify  and  track  prcviou.sly  creati'd 
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relations  and  their  attributes  are  overseen  by  this  component.  An  essential  task  includes 
routing  relation  and  attribute  identifiers  to  their  correct  position  within  the  data  tables. 
Whenever  a  new  database  is  created  or  a  current  one  opened,  all  references  to  any  of 
the  relations  and  their  associated  attributes  are  handled  by  the  mechanisms  implemented 
by  the  catalog  manager.  The  catalog  manager’s  tables  must  be  linked  in  some  manner 
to  retain  the  correspondence  among  the  various  identifiers  and  permit  an  efficient  and 
accurate  traversal  of  all  tables  for  search  of  fundamental  system  data.  .\  design  decision 
mav  lie  in  how  to  link  the  tables  together.  Methods  such  as  using  pointers  or  ind<‘.Ke.s  to 
match  table  columns  together  are  simple  and  establish  a  clean  connection  between  tables. 

In  comparison  to  traditional  relational  database  system  catalog  managers,  many  sim¬ 
ilarities  exist  for  catalog  management  of  nested  relations.  For  a  nested  relational  database 
system,  the  catalog  manager  has  to  ensure  all  relations  and  nested  relations  are  handled 
reliably  and  without  confusion  as  to  what  is  a  relation  and  what  is  a  relation-valued  at¬ 
tribute  embedded  within  a  relation.  That  is,  nested  relations  cannot  be  confused  with 
top-level  relations  or  schemas.  In  addition,  duplicate  atomic  or  relation-valued  attribute 
identifiers  must  always  be  clearly  differentiated  to  eliminate  any  confusion  which  may  ari.se 
among  the  various  relations  in  the  data  dictionary. 

The  catalog  manager  may  be  designed  in  various  ways,  but  it  should  take  advantage 
of  the  resident  data  model’s  own  structure.  For  instance,  relational  database  systems  nor¬ 
mally  use  relations  as  the  catalog  manager’s  vehicle  for  maintaining  user-defined  relations 
within  the  database  system  itself(l).  With  this  perspective,  the  nested  reln^i.jadl  database 
system  should  take  advantage  of  nesting  certain  attributes  of  its  catalog  tables  to  manage 
database  tables  defined  by  users  of  the  system.  Catalog  manager  tables  can  be  de.signed 
to  maintain  information  of  relations  within  the  database,  maintain  all  the  attributes  of 
every  relation  within  the  database,  and  mriintain  those  relation-valued  attributes  identi¬ 
fying  nested  relations  within  the  database.  Information  required  by  the  catalog  manager 
to  load  its  tables  will  be  obtained  through  the  scanning  and  parsing  of  relation  creation 
commands.  Once  this  information  is  passed  from  the  scanner,  through  tlie  parser,  and  to 
the  catalog  manager,  the  data  must  be  input  into  its  tables  in  a  particular  format  defined 
within  the  system  and  commonly  referred  to  as  the  data  dictionary. 
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3.^  The  Data  Dictionary 

The  data  dictionary  is  a  component  operating  in  conjunction  with  the  catalog  man¬ 
ager  to  support  the  organization  of  symbol  identifiers  and  statistical  information  necessary 
to  maintain  relations  within  the  database  system.  Symbol  identifiers  include  all  uniciue 
entities  defined  by  database  users  which  may  describe  individual  relations  or  parts  of  a  re¬ 
lation.  Information  associated  with  each  identifier,  characterizes  the  state  of  the  particular 
entity  and  its  placement  with  respect  to  all  other  symbols  within  the  data  dictionary. 

There  are  two  primary  types  of  data  available  within  the  data  dictionary.  First, 
atomic  data  will  normally  consist  of  one  of  the  three  basic  data  types  found  in  most 
programming  languages:  integer,  float,  or  character.  Second,  a  database  must  distinguish 
the  different  table  or  relation  types  available  for  relation  creation  ( 12).  .\  table  type  defines 
the  structure  of  relation  that  may  be  used  for  future  instantiation  as  relation  or  a  relation¬ 
valued  attribute.  That  is,  it  provides  the  relation  definition  which  is  to  be  nested  within 
another  relation.  A  table  type  is  considered  to  be  defined  on  the  fly  because  it  is  a  new 
entry  into  the  database  and  uses  other  data  types  for  its  attributes. 

The  symbol  table  is  comprised  of  the  table  types  defined  for  the  nested  relational 
database  system  along  with  their  individual  atomic  or  relation-valued  attributes.  Once 
a  relation  is  created,  using  of  the  existing  table  types,  the  catalog  manager  parses  the 
relation  information  and  routes  it  to  the  relation  table  rather  than  the  symbol  table. 

.4s  expressed  earlier,  the  data  dictionary  is  a  repository  of  symbol  identifiers  and 
associated  information  characterizing  each  of  the  symbols  in  the  symbol  table.  The  same 
requirements  hold  true  for  the  data  dictionary’s  relation  table.  Several  of  these  elements  in 
combination  with  each  other  provide  the  uniqueness  each  symbol  or  relation  requires  for  the 
catalog  manager  to  distinguish  among  the  considerable  amount  of  entries  possible  within 
the  data  dictionary.  The  design  of  any  type  of  database  system  will  require  a  minimum 
.set  of  table  characteristics  to  enable  the  catalog  manager  to  easily  access  the  necessary 
symbol  or  relation  and  acquire  the  information  requested  by  another  system  component. 
Under  the  nested  relational  database  structure,  the  attributes  required  for  these  two  tables 
normally  include  the  index,  name,  data  type,  and  any  reference  to  another  entity  in  the 
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same  table  or  the  other  table. 

3.4- 1  Name.  One  property  to  which  the  user  is  most  likely  to  relate  is  the  name  of 
the  table  entry.  Upon  entry  of  a  table  type  or  relation,  the  user’s  chosen  name  is  added  to 
the  catalog  manager  tables. 

The  use  of  a  name  does  permit  useful  dissemination  of  data  dictionarv  information, 
although  it  may  be  used  more  than  once  among  the  attributes  of  the  defined  table  typi's  in 
the  symbol  table.  Other  attributes  are  acquired  for  each  of  the  table  entries  to  eliminate 
any  ambiguity  during  catalog  manager  scans  of  the  table  data. 

3.4.2  Type.  A  system  designed  under  an  object-oriented  environment  emphasizes 
strong  typing  facilities.  Data  typing  lends  itself  best  to  mapping  real-world  problems  and 
their  solution  domains  into  the  corresponding  software  domains.  The  symbol  table  ensures 
all  data  types  are  preserved  as  long  as  the  object  is  defined  in  the  data  dictionary. 

3.4- 3  Level.  The  level  of  a  symbol  in  the  symbol  table  provides  further  information 
about  a  specific  object  in  the  database  system.  There  are  two  basic  levels  available  for  an 
object  within  the  nested  relational  data  model.  Although  a  relation  may  have  any  numlier 
of  nesting  levels,  the  only  distinction  among  the  elements  within  the  symbol  table  focu.sos 
upon  the  relationhip  between  a  table  type  and  its  attributes,  whether  atomic  or  relation¬ 
valued.  Therefore,  the  inherent  number  of  levels  is  tw'o.  That  is,  a  symbol  is  at  the  tublr 
or  the  attribute  level.  The  level  property  attempts  to  augment  the  attribute  property  by 
supporting  the  distinction  between  table  symbols  and  relation-valued  symbols.  Level  is  of 
no  concern  within  the  relation  table. 

3.4- 4  Number.  This  data  dictionary  attribute  capsulizes  the  definition  of  a  table 
type  through  application  of  a  quantitative  measure.  After  parsing  of  the  table  type  defini¬ 
tion.  the  number  of  attributes,  atomic  and  relation- valued,  provide  a  gauge  as  to  the  size 
of  the  relation.  Use  of  a  table’s  size  number  allows  succinct  scans  of  relations  and  their 
nested  relations.  In  addition  to  using  the  number  as  a  size  metric  for  table  types,  the  ^izc' 
of  atomic  attributes  defined  as  character  arrays  may  also  be  stored  here. 


3.4-5  Parent.  To  further  enhance  the  cohesiveness  of  the  data  dictionary,  l  ai  h 
of  the  items  in  the  symbol  table  bears  a  direct  reference  to  its  defining  table  type.  In 
conjunction  with  the  /cue/ designation,  the  parent  attribute  identifies  a  particular  syinbolV 
table  type.  Symbols  at  the  table  level  use  an  arbitrary  level  designator  since  there  is  no 
defining  table  type  or  database  at  this  point.  If  more  than  one  database  is  allowed  to  <'xist 
within  the  system,  then  each  table  level  attribute  may  enter  the  name  of  the  databasi'  in 
which  it  resides. 

A  number  of  other  statistics  may  increase  the  eiTectiveness  of  the  data  dictionary, 
but  the  above  attributes  apparently  provide  adequate  coverage  of  necessary  data  ehunents 
within  the  database.  The  number  of  attributes  per  table  entry  is  small  enough  not  to  over¬ 
burden  the  database  system,  while  providing  a  fully  functional  nested  relational  database 
system. 

3.5  Query  Optimizer 

The  query  optimizer  is  a  standard  system  component  capable  of  transforming  a  user's 
input  query  tree  into  an  efficient  query  access  plan.  In  effect,  the  component  optimizes 
the  information  embedded  within  the  query  in  accordance  with  rules  devised  by  the  DBE 
for  a  nested  relational  database  system.  The  query  tree  is  formed  according  to  DDE  data 
structures  during  parsing  of  the  input  query.  As  the  tree  progresses  through  the  optimizer, 
information  within  the  tree  nodes  are  used  for  optimization  purposes,  and  the  resulting 
plan  is  deposited  at  the  output  of  the  component. 

The  query  tree  is  built  node- by- node  based  upon  information  parsed  from  the  user's 
database  request.  The  tree  is  linked  together  by  means  of  query  nodes,  each  node  repre¬ 
senting  the  objective  of  a  relational  algebra  operator. 

Because  the  nested  relational  data  model  relies  upon  an  extended  relational  algebra 
as  well  as  the  nest  and  unnest  operators,  each  node  must  earmark  enough  space  for  each 
of  the  operators  and  their  respective  list  structures.  Depending  upon  the  specific  operator, 
lists  will  be  necessary  data  structures  to  hold  auxiliary  data  permitting  the  operators  to 
descend  relations  and  iheir  relation- valued  attributes  under  the  current  query's  considera- 
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tion.  The  query  tree  is  a  binary  structure  comprised  of  query  or  tree  nodes,  with  braiu  he^. 
representing  the  list  structures,  extending  from  each  of  these  tree  nodes.  .As  tiic  tree  is 
traversed  within  the  optimizer,  the  lists,  if  present,  will  also  have  to  be  navigated  ledore 
departing  the  node. 

The  optimizer  component  itself  is  generated  by  the  E.xodus  tool  known  as  the  o|)ti- 
mizer  generator.  The  generator  is  a  rule-based  tool  requiring  an  input  of  four  elennnits  to 
produce  the  necessary  optimizer  for  a  specific  data  model  or  application.  .\s  already  staleil. 
the  operators  are  extended  versions  of  the  traditional  relational  algebra  and  the  addition 
of  the  nest  and  unnest  operators.  Briefly,  the  four  elements,  placed  in  a  description  file 
include: 

•  Operators. 

•  Operator  methods. 

•  Transformation  rules. 

•  Implementation  rules. 

The  operators  are  the  actual  operators  required  for  the  nested  relational  data  niudel. 
These  include  at  least  the  following  operators: 

•  Project 

•  Select 

•  Join 

•  Nest 

•  Unnest 

Other  operators  which  will  factor  into  the  full-fledged  model  include  the  iliffertMue. 
the  union,  and  the  intersection  operators.  To  provide  more  efficient  operations  within 
this  particular  data  model,  these  operators  will  eventually  need  to  be  included  during  the 
development  of  the  database  system  since  they  are  necessary  for  set  functionality. 
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In  order  to  use  operators,  their  implementations  must  be  compiled  and  readilv  avail- 
able  as  object  modules  within  the  system.  The  implementations  may  be  numerous  for 
each  one  of  the  operators.  For  example,  the  project  operator  may  use  either  a  fde  scan  or 
a  filtering  implementation.  Each  implementation  is  known  as  an  operator  method  and  is 
consequently  included  in  the  description  file  of  the  optimizer  generator. 

The  transformation  and  implementation  rules  deal  primarily  with  how  tin-  operator 
and  its  methods  are  to  be  used.  Depending  upon  the  nature  of  the  query,  certain  methods 
may  or  may  not  be  possible.  If  the  query  is  under  optimization,  these  two  rules  in  tin- 
description  file  are  scanned  to  determine  which  methods  are  permissible,  and  of  these 
permissible  ones,  which  one  is  the  most  efficient,  .\gain,  the  DBFi  determines  what  rules 
are  required  for  these  last  two  categories  of  the  description  file. 

After  the  query  tree  undergoes  a  transformation  according  to  the  rules  documented 
in  the  description  file,  the  query  is  still  in  the  form  of  a  binary  tree  but  the  operators  have 
been  replaced  by  operator  methods.  The  query  is  now  closer  to  an  actual  implementation 
and  another  level  of  translation  is  required  to  obtain  the  actual  source  code  for  compilation. 
The  query  access  plan,  cls  it  is  now  known,  is  passed  to  the  query  compiler. 

Since  the  query  optimization  phase  of  the  database  system  is  extremely  import  a /it. 
its  implementation  is  as  important  as  the  nested  relational  data  niodel  itself.  The  go;d  of 
developing  the  data  model  is  to  seek  increased  performance  in  comparison  to  the  traditional 
relational  data  model.  Because  of  this  aspect  of  the  research,  the  optimizer  n/Ies  are  under 
design  and  implementation  by  another  student  and  not  within  the  scope  of  this  [iroject. 

3. 6  Query  Compiler 

The  incoming  query  access  plan  (or  plan  for  short)  is  now  in  an  efficient  structure 
ready  for  processing.  The  query  compiler  is  ready  to  compile  the  generated  E  code  into 
an  object  module.  However,  the  plan,  still  in  a  tree-structure,  must  be  transformed  into  K 
source  code.  That  is,  the  tree  must  be  “walked”  to  generate  the  E  code. 

The  translation  of  the  plan  into  E  source  code  is  performed  by  an  important  function 
referred  to  as  the  tree-to-E  routine.  The  tree-to-E  routine  takes  one  node  at  a  time  ami 


.3-10 


transforms  its  data  into  an  E  source  file.  At  each  node,  the  operator  nunhod  iiiust  1>(> 
extracted  and  sent  to  the  E  file.  Embedded  within  the  plan  are  commands  to  form  all 
intermediate  relations  or  temporary  relations  if  necessary,  and  the  final  relation  iiiftTO'd 
from  the  query.  Intermediate  relations  may  have  to  be  established  between  upstre.nm  and 
downstream  query  nodes  to  store  the  extracted  data.  The  tree-to-E  routine  mii.st  generate 
the  structures  for  these  intermediate  and  final  relations. 

The  tree-to-E  routine  uses  a  depth-first  traversal  of  the  plan  to  basically  climb  th<' t  ree 
from  bottom  to  top.  This  is  necessary  because  the  bottom  nodes  contain  the  references 
to  the  system’s  relation  tables  which  all  user  queries  must  intially  access.  Hetwemi  the 
other  plan  nodes,  the  data  is  sent  downstream  for  further  processing.  In  other  words,  the 
extracted  information  is  sent  to  the  next  upper  node  in  the  plan  tree,  ’fhe  ilevelopmont 
of  each  of  the  intermediate  structures  depends  upon  the  auxiliary  information  associated 
with  each  of  the  plan  nodes. 

The  au-xiliary  nodes  attached  to  each  of  the  query  nodes  support  logical  navigation 
throughout  the  nested  relations.  The  nested  relational  data  model  requires  a  <lesceiit 
i)f  some  fa.shion  into  a  top-level  relation  to  the  appropriate  nested  relation  or  relation- 
valued  attribute  for  which  the  operator  under  consideration  must  apply  its  method.  The 
auxiliary  nodes  maintain  this  navigation  information  to  guide  the  operator  to  tiie  (iiKuied 
nested  relation  as  well  as  atomic  attributes.  Similar  to  walking  the  plan  tree,  the  tree-lo-K 
routine  traverses  the  au.xiliary  node  lists  to  translate  this  information  into  operations  to 
be  performed  upon  atomic  and  relation-valued  attributes. 

Eventually  the  tree-to-E  routine  reaches  the  top  node  of  the  plan  tree.  Once  all  the 
information  in  the  auxiliary  nodes  are  processed,  the  main  routine  must  be  appended  to  all 
other  query  declarations  and  definitions  extracted  from  the  plan  tree.  Because  the  (piery 
is  the  primary  impetus  of  compiling  all  modules  into  a  cohesive  executable  module,  tin' 
addition  of  the  main  function  occurs  at  this  final  stage  of  E  code  generation.  I'lie  (pnuy 
is  finally  ready  for  transfer  into  the  E  compiler.  The  object  modules  ami  the  general ('d 
query  Fi  code  are  gathered  together  for  compilation  together  within  the  system's  compih'r. 
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3.  7  E  Compiler 

The  query  is  now  located  in  what  may  be  considered  the  “backend"  of  the  ilatabas(' 
system.  At  this  point,  no  more  processing  of  the  query  will  tran.pire  since  it  is  in  its 
most  efficient  state.  The  tasks  at  hand  focus  on  compiling  the  generated  h  code  and 
then  linking  and  loading  it  with  the  necessary  object  modules  for  query  execution,  l  lu' 
operator  and  access  methods  are  implemented  at  this  point  and  their  object  modules  exist 
in  an  active  mode  for  any  number  of  compilations  with  user  queries.  That  is.  the  (|uery. 
after  compilation,  will  be  dynamically  linked  to  the  E  run-time  system  (ERTS)  (2).  1  in- 
relations  within  the  database  system  are  persistent,  so  the  compilation  must  occur  with 
the  E  compiler.  Just  linking  the  E  library  with  the  C  compiler  will  not  permit  the  handles 
to  storage  objects  to  be  coupled  with  the  executable  module.  All  compilation,  linking,  and 
loading  must  occur  employing  the  E  compiler. 

The  target  of  the  executable  module  is  the  Exodus  storage  manager.  .\lt  hough  the 
nested  relational  database  system  is  an  executable  module  developed  using  thi'  Exodus 
tools,  the  query  is  also  an  executable  module  requiring  Exodus  resources.  The  database 
program  must  manage  the  query  in  the  form  of  a  smaller  program,  directing  t  he  execut  aide 
module  to  a  file  for  driving  the  query.  The  system  must  set  up  a  file  to  deposit  the  loaded 
rode  before  execution  of  the  query.  Thus,  a  primary  task  of  the  "backend"  is  to  handle 
system  buffers  and  files  for  transferring  of  modules  before  final  query  execution. 

3.3  Operator  Methods 

Since  a  query  tree  produced  by  the  parser  only  contains  the  relational  operaiois 
required  for  the  user’s  query,  methods  must  be  maintained  in  the  ERJ'S  to  implement 
the  operations.  However,  each  of  the  nested  relational  algebra  operators  may  contain  aii\ 
number  of  methods  to  implement  it.  Two  of  the  most  popular  methods  pi-rform  some 
form  of  file  scan  of  the  relations  in  the  storage  manager,  or  a  tuple  filtering  tt'chnique  to 
create  a  stream  of  data  leading  from  one  or  more  relations  into  one  final  relation.  Once 
the  DDE  designs  and  implements  the  operator  methods,  the  object  modules  riunain  in  the 
“backend"  for  all  transient  queries. 
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3.9  ,4ccess  Methods 


Along  with  operator  methods,  access  methods  such  as  B+  trees  niny  euhaiuf  tin- 
efficiency  of  performing  the  query.  Just  as  the  operator  methods,  the  accr-ss  metluxU 
must  be  designed  and  implemented  so  that  their  objects  will  be  available  at  th<>  linn-  ot 
query  compilation.  For  nested  relations,  access  methods  will  still  be  operating  on  o-lai  loii'. 
although  navigation  methods  embedded  within  the  access  plan  exitting  tin-  (jucry  op'iiiii/>  r 
must  guide  the  access  method  to  the  correct  attribute.  Just  as  with  the  operator  nn-t  liods. 
the  E  compiler  links  and  loads  the  necessary  access  methods  to  the  specific  (pii-ry. 

3.  JO  Storage  Manager 

The  storage  manager  is  the  component  required  to  manage  all  storage  objects  in¬ 
stantiated  within  the  database  system.  .\  nunrber  of  functions  are  available  to  t'-.e  DBE 
to  handle  specific  objects.  These  functions  allow  the  DBE  to  manipulate  objects  at  the 
byte  level,  inserting  and  deleting  bytes  as  deemed  necessary.  To  enhance  performance  of 
a  particular  data  model,  storage  object  primitives  may  be  used  to  direct  database  system 
operations  at  this  level.  However,  to  initially  run  the  system  at  a  higher-level  of  abstrac¬ 
tion  for  the  nested  relational  data  model,  this  level  is  not  tampered  with  since  tin’  storagi- 
manager  adequately  creates  and  modifies  objects  as  required. 

?. //  Database 

The  database  works  in  conjunction  with  the  storage  manager  to  extract  the  necessary 
information  from  defined  relations.  The  relations  are  instantiated  obj('cts  permitting  an 
arbitrary  level  of  nesting  to  represent  the  nested  relational  data  model. 

3.12  Summary 

The  Exodus  extensible  architecture  is  comprised  of  tools  and  components  required 
for  designing  a  database  system  with  respect  to  a  specific  application  or  data  modt-l. 
This  research  effort  uses  this  development  effort  to  construct  a  nested  relational  datalnase 
management  system. 


IV.  Dtsign  and  Irnplf  mcntntion 


4.1  Nested  Relation 

A  primary  objective  for  implementing  the  nested  relational  data  nuxh'l  is  to  incii  a.-c 
system  performance  over  other  systems  for  non-traditional  database  applications.  1  io^ 
performance  translates  into  efficient  retrieval  of  information  with  re.spect  to  other  data 
models  such  as  the  traditional  relational  data  model,  the  network  model,  and  tlie  hieiar- 
chical  model.  In  addition  to  efficiency,  the  data  model  lends  itself  well  to  object-oriented 
design  (OOD)  (.3),  an  increasingly  popular  methodology  for  transformation  (jf  real-worid 
problems  into  the  software  domain.  Data  modeling  of  composite  objects  ( 10)  also  relie.,  on 
the  concept  of  nested  relations. 

Because  of  the  relatively  new  approach  of  OOD  and  its  corresponding  languages 
(.■\da.  C-f4-,  Smalltalk),  the  design  and  implementation  of  nested  relations  has  focused  on 
other  methodologies.  Each  of  these  may  rely  on  such  techniques  as  the  phy.si<  al  stortige 
of  the  data  structures  maintaining  the  data  of  a  composite  object,  rather  than  the  logical 
nature  of  the  model.  The  logical  structure  of  nested  relations  is  not  propagated  t  hroughout 
the  entire  system,  a  major  goal  of  this  thesis. 

The  Exodus  database  system’s  programming  language  allows  for  the  placement  of 
identically  defined  objects  within  a  collection.  In  (2)  and  (5),  dbclass's  and  ilbstnict's 
are  declared  as  collections  of  structures,  where  a  structure  represents  the  attribute-,  i,f 
a  relation.  The  collection  is  actually  a  relation,  where  the  structures  are  tuph's  of  the 
relation. 

Instead  of  just  placing  single,  atomic  attributes  within  a  structure,  the  collection 
construct  is  also  allowed  within  a  structure.  In  this  manner,  it  is  possible'  to  imp!,  itient 
the  concept  of  nested  relations.  Exodus  permits  the  data  model  to  have'  a  logical  e(|uivaleiit 
using  this  construct. 

In  addition  to  defining  nested  relations  via  collections,  the  logical  nat  ure  of  i  In'  <!at  a 
structure  docs  propagate  throughout  all  components  comprising  the  systi'in.  Routiin'.-- 
[lassing  relation  arguments  can  expect  the  same  identical  data  structure  throughout  the 
entire  system,  from  the  catalog  manager  to  the  storage  manager. 
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dbstriict  child  { 
dbchar  name  [20]; 
dbint  age; 
dbchar  sex[2]; 

public: 

child (char  *,  int,  char  ♦); 
char  *  get.nameO; 
int  get.ageO: 
char  get.sexO  ; 

}; 

Figure  4.i.  The  child  relation  specification. 

4.1.1  Implementation.  The  design  of  the  nested  relation  calls  for  a  simple,  efficient, 
and  logical  method  of  translating  the  abstract  concept  of  nested  relations  into  the  software 
domain.  Exodus  adds  several  constructs  to  the  C++  programming  language  which  permit 
such  a  definition.  The  collection  construct  and  several  associated  commands  guide  the 
development  of  the  nested  relation. 

The  previously  defined  relations  in  Chapter  2  are  used  in  the  foregoing  e.xamples 
to  illustrate  implementation  procedures  for  a  nested  relation.  The  child  relation  fouml 
in  Figure  2.1,  requires  specifying  its  attributes  using  a  structure  construct.  Since  the 
relation  is  to  be  persistent,  the  structure  must  be  prefixed  with  the  two  characters  "db". 
Several  other  syntactic  peculiarities  distinguish  Exodus  structure  definition  from  C  +  +  and 
C  structures  and  will  be  noted  as  necessary. 

The  structure  definition  for  the  child  relation  is  shown  in  Figure  4.1.  .As  part  of 
the  specification,  private  and  public  members  exist  to  preserve  data  hiding  features.  The 
three  private  data  elements  are  the  attributes  of  the  relation,  while  the  member  functions 
perform  the  insertion  of  data  and  access  to  each  of  the  data  elements.  The  con.strnctor 
requires  the  three  data  types  for  the  data  elements  when  allocating  space  for  a  tuple.  .Since 
the  member  functions  are  public,  they  permit  access  to  the  relation  d.ata. 

The  semicolon  appended  to  the  dbstruct  body  is  different  than  C.  where  none  is  ixMiuired. 

The  implementation  of  the  member  functions  are  found  in  a  “.e"  file.  In  Figure  1.2. 
these  functions  basically  use  assignment  of  data  from  a  dummy  variable  into  the  data 
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child:  :chilci(char  *  n,  int  a,  char  s)  { 
char  *  dsstl  *  name; 
while(*de8tl++  =  n++) ; 
age  =  a; 

char  *  dest2  =  sex; 
while(*dest2++  =  *n++) ; 

> 

char  *  child; :gQt_name()  { 
dbchar  *  source  =  name; 
char  *  dest  =  new  char [20]; 
while(*de3t++  =  3ource++) ; 
returnCftdest [0] ) ; 

} 

int  child: :get_age()  { 
retum(age) ; 

} 

char  child: :get_sex()  { 
char  ♦  source  =  sex; 
char  ♦  dest  *  new  char [2] ; 
while(*dest++  =  *30urce++) ; 
return(ftdest[0] ) ; 

> 

Figure  4.2.  Member  function  implementation  for  child  relation. 


member.  A  character-by-character  transfer  is  used  for  string  data.  E.xodus  also  (jermits 
the  ease  of  implicitly  converting  “db"  types  to  non-db  ones  during  simple  assignment.  I'his 
allows  the  easy  access  and  conversion  of  data  for  use  in  other  routines.  The  double  colon, 
cissociates  each  object  with  its  member  functions. 

Once  the  child  relation  is  specified,  it  may  be  used  within  other  structure  definitions. 
To  nest  the  child  relation  within  the  employee  relation,  a  collection  of  child  structures 
becomes  the  relation- valued  attribute  children.  Figure  4.3  illustrates  the  employee  relation. 

The  definition  of  the  collection  is  a  two-step  process.  First,  a  dbclass  type  must 
be  specified  for  the  collection  of  child  tuples.  Second,  the  children  attribute  is  ileclared 
as  the  collection  type  childRVA.  The  “RV.^”  suffix  is  an  abbreviation  for  '‘relation-valued 
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dbstruct  employee  { 
dbchar  dept [5] ; 
dbchax  name [20] ; 
dbint  age; 
dbchar  ssn[ll] ; 

dbclass  childRVA; collection [child] ; 
childRVA  children; 
public: 

employeeCchar  *,  char  ♦,  int,  char  *); 

char  *  get_dept(); 

char  *  get_name(); 

int  get.ageO; 

char  ♦  get.ssnO  ; 

>: 


Figure  4.3.  The  nested  relation  specification  for  employee. 


dbclass  employee.rel : collection [employee] ; 
persistent  employee. rel  employees; 

Figure  4.4.  Declaration  of  a  persistent  employees  relation. 


attribute.” 

The  member  functions  refer  to  only  the  atomic  attributes  of  the  relation,  while  the 
collection  of  child  structures  are  referenced  by  the  C  address-of  operator,  By  latching 
to  the  address  of  a  particular  employee  tuple  via  the  k  operator,  its  child  tuples  may  bo 
further  referenced,  with  all  the  member  functions  of  the  c/izW  object  visible  to  the  ernployie 
object.  Thus,  the  recursive  descent  into  nested  relations  is  primarily  possible  by  using  the 
k  operator. 

The  employee  relation  is  actually  a  collection  of  these  structures,  so  the  letter  "s"  is 
appended  to  employee  and  a  persistent  object  is  declared  as  this  collection  of  structures, 
shown  in  Figure  4.4. 

Whenever  the  object  is  to  be  used  in  some  application  or  query,  an  e.xternal  variable 
declaration  is  used  in  the  source  file  requiring  the  relation.  The  keywords  "persistent"  and 
"employeeRVA”{in  this  example)  must  also  be  included  in  the  “extern”  declaration. 
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4-2  The  Parser 


The  parsing  component  drives  the  various  functions  of  the  database  system.  .Several 
functions  pertaining  to  the  catalog  manager  provide  input  to  the  data  dictionary,  wliile  a 
number  of  other  procedures  involve  the  structuring  of  the  query  tree.  The  parser  passes  t  he 
query  tree  to  the  query  optimizer,  while  the  query  access  plan  is  sent  to  the  E  source  code 
translation  routines  via  this  componeiu.  In  essence,  this  parser  steers  the  functionality  of 
the  system. 

Although  Exodus  expects  a  parser,  no  tool  or  editor  is  provided  in  developing  the 
component.  The  UNIX  tool,  YACC,  is  a  suggested  instrument  for  constructing  this  el¬ 
ement,  as  it  was  also  used  in  the  Exrel  relational  database  system.  V.\CC  translates 
grammar  rules  in  Backus-Nauer  Form  into  the  C  programming  language.  Since  E-specific 
language  constructs  cannot  be  used  within  a  YACC  file,  the  routines  invoked  within  the 
parser  must  reside  within  an  E  file. 

In  modularizing  the  functionality  of  the  parser,  grammar  rules  separate  the  routines 
of  the  different  components.  Those  procedures  interacting  with  the  catalog  manager  are 
grouped  under  data  definition  language  statements,  whereas  query  statements  contain  tlie 
query  tree  building  routines.  These  latter  statements  guide  the  final  cpiery  tree  to  the 
(juery  optimizer,  and  pass  the  optimizer’s  output  to  the  E  source  code  routines.  Upon 
e.xecution  of  the  query,  the  parser  regains  control  of  the  system. 

4.2.  J  Implementation.  The  function  of  the  parser  is  two-fold.  First,  it  must  interact 
with  the  catalog  manager  to  insert  data  definitions  into  the  data  dictionary.  Second,  it 
controls  the  parsing  of  the  query  into  a  query  tree  representing  the  Colby  relational  algebra 
for  nested  relations.  Since  the  catalog  manager  and  the  query  tree  structure  rely  on  C-(--f 
and  E  constructs,  the  parser  is  function-driven,  passing  the  necessary  operands  to  their 
appropriate  destinations. 

Two  separate  data  definition  commands  are  possible  when  interacting  with  the  cat¬ 
alog  manager.  To  create  the  table  types,  create  type  prefixes  the  type  information.  In 
order  to  enter  the  child  table  types  into  the  data  dictionary,  the  following  command  is  used 
for  this  purpose: 
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create  type 

child  ■  (name : char (20) ,  agerint,  sex: char (2)) 
create  type 

employee  =  (dept ; char (5) ,  name : char (20) ,  ago:int, 
ssn:char(ll) ,  children: child) 

7  his  information  is  placed  in  the  data  dictionary’s  symbol  table  as  discussed  b('low. 

The  second  data  definition  command  permits  the  declaration  of  actual  nested  re¬ 
lations  for  the  database  via  the  keywords  create  table.  For  example,  to  declare  ilie 
employees  relation,  the  command  is  used  in  the  following  manner: 

create  table 
employees : employee . 

As  with  the  symbol  table,  the  relation  table  maintaining  the  information  for  relations 
residing  in  the  database  is  discussed  below.  The  data  for  the  relation  must  be  present  in 
a  (j'lNTX  file,  laid  out  in  the  following  format: 

[  top-level  attr,  top-level  attr,  ...  ■(  (nested  attrs)  ...  }] 

An  example  of  this  format,  from  :i.e  employees  relation,  may  be  illustrated  in  the 
following  way: 

C  Acet,  Washington  A.B.,  33,  192-83-7465  {  (Bob,5,M)  (Carol, 4, F) 

>] 

C  Eng,  Carter  G.H.,  38,  325-96-0127  {  (Kyle, 10, M)  (John, 12, M) 

(Lynne, 6, M)  }] 

[  Dev,  Kennedy  E.F.,  45,  519-73-3790  {  (Mike, 7, M)  }] 

[  Res,  Lincoln  C.D.,  44,  234-61-9825  {  (Tom,5,M)  (Jeremy, 4, M) 

(Tiffany, 7, F)  (Anele,l,F)  }] 

After  declaring  the  employees  relation,  the  appropriate  header  files  arc  constructi'd 
and  implementation  routines  invoked  to  insert  the  data  into  the  object. 
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The  parser  must  assist  in  building  the  query  tree  structure  for  the  query  optimizer. 
Since  there  are  three  possible  operators,  the  parser  must  recognize  the  following  tliree 
query  formats: 

PJ  •[  project  list  >  (  relation  or  query  ) 

SL  (  relation  or  query  )  C  condition  ]  {  select  list  } 

NJ  (  relation  or  query  )  •[  join  path  }  (  relation  or  query  ) 

These  symbols  represent  the  project,  select,  and  natural  join  operators,  rcs[)c'(  ti  vnly. 
Within  parenthesis,  another  query  may  be  nested  instead  of  a  relation,  forming  the  se¬ 
quence  of  relational  operators  for  the  query  tree. 

The  parser  is  concerned  with  invoking  the  appropriate  functions  and  passing  the 
correct  arguments.  All  three  operators  must  construct  an  operator  list,  although  the 
natural  join  operator  list  is  known  as  a  “join  path.”  The  parser  passes  the  given  relation- 
valued  and  atomic  attributes  with  the  make_list{)  function. 

For  each  of  the  three  operators,  a  query  node  must  be  created  and  the  o[)eraror 
inserted  into  the  correct  portion  of  the  node.  The  parser  passes  the  operator  to  the  (|uery 
tree  building  routines  through  the  function  make_qNo(ie{).  If  a  relation  name  is  part  of 
the  query,  rather  than  a  nested  query,  the  name  is  also  passed  to  the  query  node. 

If  a  selection  operator  is  undergoing  parsing,  the  conditions  or  predicates  also  recpiire 
separate  predicate  nodes.  The  square  brackets  delimiting  a  condition  indicate  that  the 
enclosed  arguments  must  be  allocated  space  through  the  function  call.  make_pred(].  riiis 
function  passes  information  regarding  attribute  names,  the  operator  to  be  used  (  = 

, ...),  and  the  criteria  to  match  the  attribute  against. 

4-3  The  Catalog  Manager  and  Data  Dictionary 

An  accurate  account  of  all  tables  and  their  types  must  be  majntained  by  the  catalog 
manager  component.  The  data  elements  comprising  the  data  dictionary  are  placed  into 
tables  by  the  catalog  manager.  The  use  of  C-1-+  constructs  allows  easy  insertion  of  data 
into  the  tables,  as  well  zis  rapid  access  of  the  data. 
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name 

level 

type 

numb 

parent 

nest  type 

child 

SCHEME 

ON.THE.FLY 

3 

SYSTEM 

none 

name 

ATTR 

CHAR 

20 

child 

none 

age 

ATTR 

INT 

-1 

child 

none 

sex 

ATTR 

CHAR 

1 

child 

none 

employee 

SCHEME 

ON.THE.FLY 

5 

SYSTEM 

none 

dept 

ATTR 

CHAR 

5 

employee 

none 

name 

ATTR 

CHAR 

20 

employee 

none 

age 

ATTR 

I. NT 

-1 

none 

ssn 

.4TTR 

CHAR 

11 

employee 

none 

children 

ATTR 

RVA 

3 

employee 

child 

Figure  4.5.  The  Symbol  Table. 


Two  data  definition  commands  link  the  parser  with  the  catalog  manager.  First,  table 
types  must  be  created  before  actual  relations.  The  parser  passes  the  attributes  for  a  table 
type  to  the  catalog  manager,  creating  a  symbol  for  the  table  type  and  its  attributes.  This 
includes  relation-valued,  as  well  as  atomic  attributes.  Each  symbol  contains  information 
to  distinguish  it  from  other  symbols  in  the  table,  the  attributes  of  that  table  representing 
the  elements  of  the  symbol  table. 

The  second  data  definition  command  is  used  to  insert  relations  into  a  second  data 
dictionary  table.  Although  a  table  type,  as  defined  in  the  symbol  table,  refers  to  a  particular 
relational  structure,  any  number  of  actual  objects  may  be  declared  as  one  of  these  tal)le 
types.  Once  a  table  or  relation  is  declared  to  be  one  of  these  table  types,  several  routines 
are  involved  to  construct  a  file  to  read  in  the  data  for  the  new  relation.  The  relation 
data  dictionary  table  must  follow  the  progress  of  each  created  relation  ensuring  that  its 
information  coincides  with  that  of  the  actual  relation. 

4.3.1  Implementation.  The  catalog  manager  receives  data  definition  argunn  iits 
from  the  parser’s  create  type  and  create  table  commands.  Two  tables  receive  this  data: 
the  symbol  table  for  table  type  information  and  the  relation  table  for  relations  persisting 
in  the  database.  From  the  child  and  employee  table  types  and  the  employees  relation,  the 
symbol  and  relation  tables  appear  as  in  Figure  4.5  and  Figure  4.6. 
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Figure  4.6.  The  Relation  Table. 


Since  both  tables  are  defined  as  collections,  member  functions  are  used  by  oi  her 
components  to  access  and  obtain  the  necessary  data. 

The  Query  Tree 

As  an  input  object  for  the  query  optimizer,  the  query  tree  must  be  constructed 
to  adhere  to  the  optimizer’s  requirements.  However,  several  parameters  are  flexible  and 
permit  the  DBF  to  build  the  tree  with  a  particular  relational  algebra  in  mind.  The  tree 
itself  is  nothing  more  than  a  series  of  query  nodes  linked  together  to  logically  access  the 
database  with  respect  to  the  relational  operators. 

Each  query  node  represents  a  relational  operator  in  the  relational  algebra,  in  this  case 
the  Colby  algebra.  One  node  may  point  to  any  number  of  other  nodes,  although  this  arity 
is  normally  set  to  two.  This  accounts  for  the  fact  that  the  standard  relational  operators 
are  either  unary  (project,  select)  or  binary  (natural  join).  Besides  the  relational  operator 
and  the  arity,  the  node  must  contain  argument  information.  The  argument  allocates  space 
for  relation  names,  an  operator  list,  and  selection  condition  or  predicate. 

Since  the  argument  provides  the  information  for  the  relational  operator  for  which  the 
node  was  created,  this  requires  additional  design.  Since  the  Colby  algebra  requires  operator 
lists  to  recusively  descend  nested  relations,  the  argument  operator  list  must  maintain  this 
particular  information. 

In  addition  to  the  operator  list  branching  out  from  the  argument  list  structure,  a 
selection  condition  may  also  form  another  branch.  If  a  condition  is  applied  to  the  top-level 
attributes  of  a  relation,  the  argument  selection  data  element  points  to  the  data  structure 
maintaining  this  condition. 

To  construct  the  auxiliary  branches  for  the  operator  lists  and  top-level  conditions, 
two  new  actors  must  supplement  the  query  tree  structure.  The  operator  list  requires  a  list 


operator 


argument 


input[0]  iuput[l] 


Figure  4.7.  The  QUERY  node. 


name 

pred 

list 

Figure  4.8.  The  ARGUMENT  substructure. 

node  to  maintain  information  of  attributes  to  be  projected  or  navigated  across  to  reach 
other  attributes.  A  predicate  node  is  used  to  maintain  information  about  attributes  against 
which  condition  criteria  are  applied. 

The  query,  as  entered  by  the  user  and  subsequently  separated  into  query  nodes  by  the 
parser,  is  formed  in  a  top-to-bottom  structure.  The  top  node  represents  the  first  relational 
operator  entered,  while  the  bottom  node  requires  information  for  the  relation  or  relations 
involved  within  the  query. 

i.4-1  Implementation.  The  query  tree  requires  three  separate  nodes  for  its  con¬ 
struction.  First,  the  query  node  is  made  of  the  elements  found  in  Figure  4.5^, 

\ 

The  operator  is  defined  as  a  long  integer,  the  two  input  elements  as  pointers  to  (piery 
nodes,  and  argument  is  a  multiple  element  structure  ris  defined  in  Figure  4.8. 

“Name”  is  a  character  array  for  the  relation  that  the  relational  operator  is  to  operate 
on,  “pred”  is  a  pointer  to  a  predicate,  and  “list”  is  a  pointer  to  an  operator  list  node. 

The  predicate  node  is  structured  as  shown  in  Figure  4.9.  The  “op”  element  is  a  long 
integer,  while  the  “left”  and  “right”  elements  are  pointers  to  additional  predicate  nodes. 

The  list  node,  illustrated  in  Figure  4.10,  maintains  the  information  to  navigate  a 
nested  relation.  An  “attribute  descriptor”  provides  information  for  the  attribute  the  node 
represents,  the  “condition”  element  points  to  a  predicate,  the  “sublist”  points  to  a  nested 
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left 


op 

right| 

Figure  4.9.  The  PRED  node. 


attr  I  cond  I  sublist  I  next 


Figure  4.10.  The  LIST  node. 


relation,  and  the  “next”  element  points  to  an  attribute  at  the  same  level  of  nesting. 

.\n  example  of  a  tree  structure  found  in  Figure  4.11,  represents  the  following  ciuery 
pertaining  to  the  employees  relation: 

PJ  {name,  children  {name}}  (  SL  (employees)  [age=33] 

{children  [age=5] }} 

The  resulting  relation  is  shown  in  Figure  4.12. 

Once  the  parser  passes  this  structure  to  the  query  optimizer  the  optimizer  transforiu.s 
it  into  a  query  access  plan. 

4-5  The  E  Source  Code  Generator 

The  query  tree  structure  will  ultimately  be  transformed  into  a  query  access  plan 
structure  by  the  query  optimizer.  The  plan  represents  the  most  efficient  access  into  the 
database.  The  structure  of  the  plan  is  similar  to  the  query  tree,  with  apparently  minor 
changes  to  the  primary  node.  Instead  of  an  operator  data  element,  the  plan  node  defines  an 
operator  method  data  element.  In  the  query  tree,  the  relational  operator  is  regarded  in  an 
abstract  sense,  while  the  plan’s  operator  method  represents  the  physical  implementation 
of  the  operator.  They  include  file  scan,  stream,  and  filtering  methods,  depending  u[>on  the 
operator.  Other  than  the  method  element,  plan  nodes  also  require  an  argument  element 
and  pointers  to  additional  plan  nodes. 
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Figure  4.11.  A  query  tree  structure. 
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name 

children 

name 

Washington  A.B. 

Bob 

Figure  4.12.  The  resulting  relation. 


operator  method 


argument 


input[0]  I  input[l] 


Figure  4.13.  The  PLAN  node. 


The  plan  nodes  are  to  be  e.Kamined  one-by-one.  The  E  source  code  generator  relics 
on  a  “tree-to-e”  routine  which  “walks”  the  plan  and  generates  E  source  code  according 
to  the  data  in  the  plan  nodes  and  their  auxiliary  branches.  The  tree-to-e  routine  initially 
descends  to  the  bottom  nodes,  and  works  its  way  upward  filling  buffers  with  header  file 
declarations,  function  and  structure  definitions,  operator  method  class  instaiitiations,  and 
various  externaJ  variable  declarations  for  those  relation  or  relations  to  be  accessed.  Onci' 
every  plan  node  is  traversed,  a  main  routine  is  entered  into  a  buffer.  The  buffers  are  iilacial 
together  and  named  as  an  E  file. 

The  E  file  is  compiled  and  its  object  code  is  linked  with  the  E  run-time  system  (  ER'l  S  ) 
(2).  .All  object  code  for  operator  method  implementations  comprises  the  ER'F.S.  .and  the 
(piery's  object  file  is  linked  with  the  operators  it  requires.  Once  the  query  is  completed  in 
its  execution,  the  memory  allocated  for  the  plan  tree  is  released  to  the  system  for  future 
use. 


4.5.1  Implementation.  The  optimizer  provides  the  query  access  plan,  first  to  the 
optimizer,  which  hands  it  off  to  the  code  generating  routines.  The  code  generator  ex[)e(  t.s 
a  tree  structure,  much  like  the  query  tree  structure,  e.xcept  that  the  plan  node  is  st  rtict  u  rod 
as  showm  in  figure  4.13.  Here,  the  integer  representing  a  relational  operator  now  represents 
an  operator  method. 

The  code  generating  routines  require  four  separate  buffers  to  place  the  a()propi i.'te 
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(lata  and  eliminate  the  possibility  of  early  declarations  with  respect  to  oll.er  declara:  iwns. 
For  instance,  the  query, 

PJ  {  name,  children  {age>  }  f employees) 

requires  code  to  be  generated  using  the  file  scan  generator  and  a  projecticui  function.  1!..' 
E  source  code  would  first  have  to  include  the  header  file  containing  the  ‘  rnployifs  ri'la'ion 
specification.  From  the  header  file,  the  file  scan  can  be  declared  for  the  tu])|i>s  specified 
in  the  child  and  employee  structures.  The  projection  function  would  then  be  gfUKuateil , 
along  with  the  new  data  structure,  which  is  reduced  to  the  attributes  (jf  ridini  and  mje. 
Firally,  the  main  routine  is  generated  with  instantiations  for  the  file  scan.  .\n  iterator 
steps  through  each  tuple  and  returns  the  da^a  to  the  two  attributes  in  the  new  relation. 
.\a  iterator  is  applied  to  the  new  relation  and  the  data  is  displayed  to  the  .screen.  The 
generated  code  is  shown  in  Figure  4.14  and  Figure  4.1.5.  .\  discussion  of  the  file  scan  class 
and  its  implementation  may  be  found  in  the  following  section. 

4-6  The  Operator  Methods 

The  operator  methods  comprise  the  ERTS.  After  an  operator  is  designed  and  iniph'- 
niented,  its  object  code  becomes  a  part  of  the  ERTS,  where  these  methods  await  linking 
with  transient  queries.  Once  the  query  is  linked  ^o  the  ERTS,  the  e.xecutaltle  module  in¬ 
vokes  the  operator  methods  and  performs  its  member  functions.  The  opt'rator  metlu/ds 
are  objects  which  are  formulated  under  an  OOD  methodology. 

The  operator  methods,  as  generator  classes,  require  certain  parameters  specific  to  t  he 
generator.  Such  parameters  include  relation  and  tuple  types,  where  generator  classes  invoke 
iterator  functions  to  process  the  relations  and  tuples.  Since  the  methods  are  gc'iu'iator 
classes,  the  operators  require  a  header  file  for  the  operator  method  specification  and  a 
file  which  contains  the  implementation  code  for  the  method.  It  is  these  classes  uhich  ar<' 
instantiated  within  the  Ei  source  code  file  generated  via  the  query  access  plan. 

The  operator  m<’thods,  as  previously  mentioned,  may  be  one  of  several  types,  for 
instance,  a  file  scan  operator  method  may  require  a  select  or  project  function,  de|)end,;.g 
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#ifndef  FILE_SCAN_H 
iinclude  "f ile.scan.h" 

#endif 

iifndef  EMPLOYEES.H 
•include  "employees .h" 

#endif 

extern  persistent  employee_rel  employees; 

dbstruct  tempi  { 
dbchar  name [20]; 
public ; 

tempi (char  *); 
char  *  get_name(); 

}; 

tempi : :templ(char  *  n)  { 
dbchar  ♦  dest  =  name; 
while(*dest++  =  *n++) ; 

> 

char  ♦  tempi : :get_name()  { 
dbchar  *  source  =  name; 
char  ♦  dest  =  new  char [20] ; 
while(*dest++  =  *source++); 
retumCtdest  [0] ) ; 

} 

dbstruct  temp2  { 
dbchar  name [20]; 

dbclasa  teniplRVA:collection[templ]  ; 
templRVA  templ.rels; 
public: 

temp2(char  ♦) : 
char  *  get_name(); 

}; 


F'igure  4.U.  The  temporary  relation  structures. 
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temp2: :temp2(char  *  n)  { 
dbchar  ♦  dest  *  name; 
while(*dest++  =  *n++) ; 

} 

char  *  tempi :  :get_naine()  { 
dbchzir  ♦  source  =  name; 
char  ♦  dest  ■  new  char  [20]; 
while(*dest++  =  *aource++) ; 
return (tdest  [0] ) ; 

> 

dbstruct  tamp2_rel  { 

dbclass  temp2_relRVA : collection[temp2] ; 
temp2_relRVA  temp2_rels; 

>: 


dbclass  temp2CT : collect  ion [temp2_rel]  ; 
temp2CT  Temp2; 

class  erployee_rol.3can:f ile.scanCemployee,  temp2,  employee.rel] 

void  sql.proj  (employee  ♦  e,  temp2  ♦  t2)  { 
employee  ft  e.ref  =  ♦  e; 

t2  =  in  (Temp2.temp2.rel3)  new  temp2(e  ->  get .name ) ; 
iterate(templ  ♦  tl  =  e.ref  .children. scanO)  { 

tl  =  in  (e.ref  .templ.rels)  new  tempKtl  ->  get.nameO); 

} 

> 

mainO  { 

employee.rel.scan  sql (ftemployee.rel ,  NULL,  sql.proj); 
iterate(temp2  •  t2  ■  sql .next.tupleO )  { 

cout  <<  formC'emp.name:  7.s\n",  t2  ->  get.nameO); 
temp2  ft  t2_ref  »  •  t2; 

iterate  (tempi  *  tl  *  t2.ref  .templs.sczuiO)  { 

cout  <<  fonn("child.name:  y,s\n",  tl  get.nameO); 

} 

} 

} 


Figure  4.15.  Implementing  the  project  query 
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upon  the  particulaj  operator  method  residing  in  the  plan  node.  Other  forms  of  operator 
methods  include  streams  and  filters.  .4  plan  tree  with  a  series  of  nodes  normally  send  tuples 
upstream  from  one  node  to  the  next.  The  operator  methods  input  these  tuples  and  process 
the  data  according  to  their  member  functions.  Figure  4.16  illustrates  the  specification  of 
the  file  scan  operator  found  in  (2),  while  Figure  4.17  provides  the  loops  join  operator  found 
in  the  same  publication.  In  the  file  scan  method,  the  selection  function  expects  to  have  a 
destination  type  parameter  since  a  supertuple  may  not  place  all  of  its  elements  into  another 
tuple.  This  occurs  because  sets  of  data  within  a  certain  attribute  may  or  may  not  meet 
the  selection  criteria. 

T7  Testing  and  Validation 

After  implementing  the  model  within  the  Exodus  architecture,  several  simple  ([ueries 
were  applied  to  the  database.  The  query  tree  structure  was  built  for  most  combinations 
of  the  project,  select,  and  join  operators.  The  E  code  generators  produced  code  for  the 
query  for  one  level  of  nesting,  invoking  either  the  file  scan  or  loops  join  operators.  No 
performance  tests  have  been  conducted  on  the  system. 
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class  file.scan  [ 
dbclass  srcType  {>, 
dbclass  dstType  {}, 

dbclass  srcRalType  {  public:  iterate  srcType  *  scanO ;  } 

] 

typedef  int  (*selFunc) (srcType* ,  dstType*); 
typedef  void  (*projFunc) (srcType* ,  dstType*); 
srcRelType  *  relation; 
selFunc  select; 

projFunc  project; 

public : 

f ile_scan(srcRelType* ,  selFunc,  projFunc); 
iterator  dstType  *  next.tupleO  ; 

}; 

f ile.scan: :f ile.3can(srcRalType  *  rptr,  selFunc  sPtr,  projFunc 
pPtr)  i 

relation  =  rPtr; 
select  =  sPtr; 
project  =  pPtr; 

} 

iterator  dstType  *  f ile.scan: :next.tuple()  { 
dstType  rsltTuple; 

iterate(srcType  *  tuplePtr  =  relation  ->  scanO)  { 
if  ((select  *=  NULL)  II  (3elect(tuplePtr) ) 
if  (project  !*  NULL)  { 
proj ect (tuplePtr ,  trsltTuple) ; 
yield(tr8ltTuple) ; 

} 

else 

yield  ((d8tType*)tuplePtr) ; 


Figure  4.16.  File  scan  cla.ss  and  implementation. 


4-18 


class  loops.join  [ 
dbclass  srcTypel  {}, 
dbclass  srcType2  {}, 
dbclass  dstType  {}, 

class  subQuery  (public:  iterator  srcTypel  *  next_tuple() ;  }, 
dbclass  innerRelType  {  public:  iteator  srcType2  ♦  scanO;  } 

] 

{ 

typedef  int  (♦matchFimc) (srcTypel*,  3rcType2*) ; 
typedef  void  (*joinFunc) (srcTypel* ,  srcTypa2*,  dstType*); 
subQuery  *  outer; 

innerReltype  *  inner; 
matchFunc  match; 

joinFunc  join; 

public: 

loopa_join(aubQuery* ,  innerRelType* ,  matchFunc,  joinFunc); 
iterator  dstType  *  next.tupleO ; 

}; 


loops.join:  .'loops. join(8ubQuery  *  query,  innerRelType  *  innerRel, 
matchFunc  matchPtr,  joinFunc  joinPtr)  { 
outer  =  query; 
inner  =  innerRel; 
match  =  matchPtr; 
join  =  joinPtr; 

} 

iterator  dstType  *  loops.join: :next.tuple()  { 
dstType  rsltTuple; 

iterate(srcTypel  *  outerTuple  =  outer  ->  next.tupleO ) 
iterate (srcType2  *  innerTuple  »  inner  ->  scanO) 

if  ((match  ■■  NULL)  II  match(outerTuple,  innerTuple))  { 
join(outerTuple,  innerTuple,  trsltTuple) ; 
yield(JtrsltTuple) ; 

} 


Figure  4.17.  Loops  join  class  and  implementation. 
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V.  Conclusion 


5.1  Summary 

This  thesis  effort  accomplished  several  objectives,  resulting  in  the  design  and  ini[)le- 
mentation  of  a  nested  relational  database  system  under  the  Exodus  extensible  database 
architecture.  The  key  objectives  include; 

•  Nesting  relations  within  relations. 

•  A  parser  to  create  and  maintain  a  persistent  data  dictionary. 

•  Implementing  the  Colby  relational  algebra. 

•  Implementing  operator  methods  for  selection,  projection,  and  natural  join  of  nested 

relations. 

An  initial  objective  of  this  effort  concerned  the  implementation  aspects  of  nested 
relations.  Through  the  use  of  the  collection  construct  provide  by  Exodus,  structures  wliich 
defined  attributes  of  a  particular  table  type  were  permitted  to  represent  single  entilie.s  or 
attributes  within  other  table  types.  By  mapping  corresponding  relations  to  collectioii.s. 
the  logical  nature  of  nested  relations  used  standard  procedures  thoughout  all  the  system 
components.  Also,  the  nested  relations  were  ea^y  to  navigate  because  of  their  logical 
structure. 

A  parsing  component  was  designed  and  implemented  to  permit  the  creation  of  nested 
relations.  The  data  dictionary  required  table  type  information  to  be  separated  into  its 
individual  attributes  and  placed  into  the  symbol  table  for  future  reference.  Since  nested 
relations  had  relations  as  their  attributes,  the  data  dictionary  allowed  previously  defined 
table  types  to  become  table  types  within  other  relations  against  which  their  relation-valued 
attributes  were  declared.  The  persistence  feature  of  Exodus  allowed  the  data  dictionary 
to  remain  in  the  storage  manager  between  program  executions. 

As  an  offshoot  of  parsing  data  for  the  data  dictionary,  a  format  to  enter  actual  data 
from  a  UNIX  file  into  a  relation  allowed  a  rapid  and  simple  method  of  loading  it  into  tlie 
relation.  The  UNIX  file  followed  a  logical  layout  for  the  data  file  in  conjunction  with  tlie 
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relation  it  was  to  fill.  The  creation  of  such  a  format,  logically  delimited  by  braces,  brackets, 
parenthesis,  and  comma,s,  provided  an  easy  vehicle  for  loading  data  into  nested  relations. 

The  relational  algebra  proposed  by  Colby  was  parsed  and  inserted  into  the  operator 
data  elements  of  the  query  tree.  The  query  nodes  and  the  tree  structure  resembled  a 
general  structure  required  by  the  query  optimizer.  Because  of  the  flexible  nature  of  the 
query  tree  and  the  nodes  which  described  it,  the  query  tree  represented  the  Colby  relational 
algebra  and  was,  thereby,  suitable  for  manipulating  nested  relations. 

The  methods  to  implement  the  project,  select,  and  natural  join  operators  resided  as 
object  files  in  the  ERTS.  Here,  query  code  wjis  compiled  and  linked  with  the  implemented 
methods  to  execute  the  query.  The  operator  methods  used  functions  to  recursively  descend 
relations,  applying  criteria  to  carry  out  the  necessary  functions. 

5.2  Future  Recommendations 

Several  enhancements  may  be  added  to  the  current  system  to  improve  future  use- 
ability.  First  of  all,  the  parser  currently  recognizes  the  three  primary  relational  operators, 
project,  select,  and  natural  join.  Additional  Colby  operators  may  be  added  to  the  system, 
including  nest,  unnest,  and  the  set  operations.  To  implement  these  operators,  additional 
methods  may  be  added  to  the  ERTS  along  with  access  methods.  Finally,  a  more  user 
friendly  query  language,  such  as  SQL/NF,  may  be  added  to  the  front  end  of  the  database 
system. 
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